Schibsted YAMS

How to build and maintain a thousands/req service with minimal dedication

daniel(dot)caballero at schibsted(dot)com

Who are you?



Daniel Caballero

Devops/SRE Engineer @ Schibsted

Part time (Devops) lecturer @ La Salle University

So... I work

... I (some kinda) teach

... I (try to) program...

... I (would like to) rock...

... and I live

So... I value my time (a lot)

And I really don't like to waste it

  • In resolving incidents
  • In repetitive work

Schibsgrñvahed..WHAT??

What is Schibsted?

Marketplaces global expansion

And SPT?

It's about convergence through global solutions

What's behind global components / services?

You build it, you run it

Nothing new in the horizon probably for you * first mention in 2006, by Werner Vogels / Amazon * Nice elaboration behind the rationale here

That means there's no ops/support/systems/devops team.

{
    "format": "webp",
    "watermark": {
        "location": "north",
        "margin": "20px",
        "dimension": "20%"
    },
    "actions": [
        {
            "resize": {
                "width": 300,
                "fit": {
                    "type": "clip"
                }
            }
        }
    ],
    "quality": 90
}

Why not offline transformations?

Lots of (user) contents given the classified business. Sites are dynamic by nature. Some of them do adapt the request to the device. Blocking redesigns or improvements because lack of capacity to reprocess

This may sound familiar to you...

CDNs able to transform contents: * *?

SaaS solutions: * imgix * libpixel * Cloudinary

Opensource solutions: * Imbo * imaginary * picfit

So why?

  • Sites where doing already that. So saving sites time
  • Close to the Schibsted sites
    • not just latency (multiregion); also feature-set, compliance...
  • Cost effective
  • Adapting to other needs:
    • Document transformation
    • Video streaming

What are you very proud about?

High usage

Does not require high maintenance

(Almost) No incidents. We would be able to maintain this with a single engineer

But be careful: if you invest 0 efforts, you kill a service * Stops being competitive * Starts being legacy * Starts to disconnect from current needs

So we try to convince the company it requires, at least, the focus of two engineers.

Low costs

Low latency

Not a new story... why not presenting it before?

How did you achieved that?

Everything as code mindset

No space for "one time" actions.

Continuous Delivery

And capacity to incorporate everything to the pipeline

Good design choices

  • AWS + Netflix stack + Microservices
  • libvips

0-error target

Yeah, Google and error budgets...

... but helped us to understand, tune, and get the trust from Sch sites, avoiding major disruptions when major onboardings

Troubleshooting toolkit

Nice solution... but

Why not docker/k8s?

  • Local tests
  • YAMS Portal/Frontend already there
  • Migration exercise

gRPC?

Why not a Service Mesh?

And Prometheus?

We may.

And it may be a good moment to consider opencensus.

Actual (& not so far) future

More elasticity to reduce costs

  • Changes in transformation rules means massive eviction
    • So we are a bit overscaled...
  • Better degradation and more efficient ASG triggers
    • Reusing cache if no capacity
    • Automatic ASG parameters adjustments
    • Minimize parallelization in the transformation pipe
    • Incoming queue

Extra compression

  • Currently libjpg-turbo
  • Good for performance, pretty decent results, but...
  • MozJPEG, api-compatible with libjpg
  • guetzli, from Google

Bringing the service closer to the business

  • Image uploader
  • Online image editor
  • Integration with data services
    • Automatic classification
    • Nudity detector
    • Car plate pixelation

  • Video transcoding...

Actual transformation pipelines

More adoption?

Some major Marketplaces are not using the service, yet

Simulating dependencies failures

Hoverfly: similar in concept to the Simian Army from Netflix, but specialized in API degradations

Stress test as part of the pipeline

Before closing...

Are you going to opensource it?

  • Schibsted do support contribution to opensource projects
  • As well as releasing internal code
  • Problem: Not following a "contribute-first" approach
  • But already contributed to bimg, zuul, krakenD...

Are you going to offer this SaaS to other companies?

Latencymap

api noiser

Corollary

Be Rx in the code...

But not in real life

Great thanks...

Sch*

And especially...

Edge colleagues

Other Qs?